Distribution-Insensitive Parallel External Sorting on PC Clusters
نویسندگان
چکیده
There have been many parallel external sorting algorithms reported such as NOW-Sort, SPsort, and hill sort, etc. They are for sorting large-scale data stored in the disk, but they differ in the speed, throughput, and costeffectiveness. Mostly they deal with data that are uniformly distributed in their value range. Few research results have been yet reported for parallel external sort for data with arbitrary distribution. In this paper, we present two distribution-insensitive parallel external sorting algorithms that use sampling technique and histogram counts to achieve even distribution of data among processors, which eventually contribute to achieve superb performance. Experimental results on a cluster of Linux workstations show up to 63% reduction in the execution time compared to previous NOW-sort.
منابع مشابه
High-speed parallel external sorting of data with arbitrary distribution
Many parallel sorting algorithms of (external) disk data have been reported such as NOWsort, SPsort, and hill sort, etc. They all reduce the execution time compared to some known sequential sort; however, they differ in terms of the speed, throughput, and cost-effectiveness. Mostly they deal with data that are uniformly distributed in their value range. If we divide and redistribute data to pro...
متن کاملParallel out-of-core sorting and fast accesses to disks
The paper addresses two problems. We investigate the problem of parallel external sorting in the context of a form of heterogeneous clusters then we investigate the impact of efficient disk remote accesses on the performance of external sorting. We explore three techniques to show how they can be deployed for clusters with proportional processor performances. We also validate the READ library, ...
متن کاملSorting by Recursive Partitioning
We present a new O(nlglgn) time sort algorithm that is more robust than O(n) distribution sorting algorithms. The algorithm uses a recursive partition-concatenate approach, partitioning each set into a variable number of subsets using information gathered dynamically during execution. Sequences are partitioned using statistical information computed during the sort for each sequence. _ Space com...
متن کاملCGMGRAPH/CGMLIB: Implementing and Testing CGM Graph Algorithms on PC Clusters and Shared Memory Machines
In this paper, we present CGMgraph, the first integrated library of parallel graph methods for PC clusters based on Coarse Grained Multicomputer (CGM) algorithms. CGMgraph implements parallel methods for various graph problems. Our implementations of deterministic list ranking, Euler tour, connected components, spanning forest, and bipartite graph detection are, to our knowledge, the first effi...
متن کاملParallel Sorting on GPU Clusters
It is becoming more common to install modern graphics cards on small to medium size commodity clusters. In addition to applications such as display walls and CAVE environments, graphics cards can be used as dedicated coprocessors that can run certain parallel algorithms very quickly. Sorting has been long recognized as an important algorithm in terms of both mathematical analysis and a way to j...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003